We introduce SketchySGD, a stochastic quasi-Newton method that uses sketching to approximate the curvature of the loss function. Quasi-Newton methods are among the most effective algorithms in traditional optimization, where they converge much faster than first-order methods such as SGD. However, for contemporary deep learning, quasi-Newton methods are considered inferior to first-order methods like SGD and Adam owing to higher per-iteration complexity and fragility due to inexact gradients. SketchySGD circumvents these issues by a novel combination of subsampling, randomized low-rank approximation, and dynamic regularization. In the convex case, we show SketchySGD with a fixed stepsize converges to a small ball around the optimum at a faster rate than SGD for ill-conditioned problems. In the non-convex case, SketchySGD converges linearly under two additional assumptions, interpolation and the Polyak-Lojaciewicz condition, the latter of which holds with high probability for wide neural networks. Numerical experiments on image and tabular data demonstrate the improved reliability and speed of SketchySGD for deep learning, compared to standard optimizers such as SGD and Adam and existing quasi-Newton methods.
translated by 谷歌翻译
ControlBurn是一个python软件包,可构建支持非线性特征选择和可解释的机器学习的特征 - 帕尔斯树合奏。该软件包中的算法首先构建了大型树的合奏,该算法优先考虑具有很少功能的基础函数,然后使用加权LASSO优化标准选择这些基础功能的功能 - SPARSE子集。该软件包包括可视化,以分析合奏选择的功能及其对预测的影响。因此,ControlBurn提供了树模型模型的准确性和灵活性以及稀疏的广义添加剂模型的解释性。 ControlBurn是可扩展和灵活的:例如,它可以使用温暖启动延续来计算具有数万个样本和数百个功能的数据集的正则化路径(任何数量选定功能的预测误差)。对于较大的数据集,运行时间在样本和功能的数量(最多到日志系数)中线性缩放,以及使用草图的包装支持加速。此外,ControlBurn框架可容纳功能成本,功能分组和$ \ ell_0 $的正规机构。该软件包是用户友好且开源的:其文档和源代码显示在https://pypi.org/project/controlburn/和https://github.com/udellgroup/controlburn/。
translated by 谷歌翻译
学习不变表示是在数据集中虚假相关驱动的机器学习模型时的重要要求。这些杂散相关性,在输入样本和目标标签之间,错误地指导了神经网络预测,导致某些组的性能差,尤其是少数群体。针对这些虚假相关性的强大培训需要每个样本的组成员资格。这种要求在少数群体或稀有群体的数据标签努力的情况下是显着费力的,或者包括数据集的个人选择隐藏敏感信息的情况。另一方面,存在这种数据收集的存在力度导致包含部分标记的组信息的数据集。最近的作品解决了完全无监督的场景,没有用于组的标签。因此,我们的目标是通过解决更现实的设置来填补文献中的缺失差距,这可以在培训期间利用部分可用的敏感或群体信息。首先,我们构造一个约束集并导出组分配所属的高概率绑定到该集合。其次,我们提出了一种从约束集中优化了优化最严格的组分配的算法。通过对图像和表格数据集的实验,我们显示少数集团的性能的改进,同时在跨组中保持整体汇总精度。
translated by 谷歌翻译
核标准和沙滕 - $ p $ quasi-Norm是低级矩阵恢复中受欢迎的排名代理。不幸的是,计算张量的核标准或schatten-$ p $ quasi-Norm是NP-HARD,这是对低级数张量完成(LRTC)(LRTC)和张量稳定性主组件分析(TRPCA)的怜悯。在本文中,我们根据张量的CP组件向量的欧几里得规范提出了一类新的张量级正规化器,并表明这些正则化是张量schatten-$ p $ quasi-norm的单调转换。该连接使我们能够将LRTC和TRPCA中的Schatten-$ p $ quasi-norm降至最低。这些方法不使用奇异的值分解,因此可以对大张量进行比例。此外,这些方法对初始等级的选择不敏感,并且与核定标准相比,该方法为低量张量回收率提供了任意尖锐的等级代理。另一方面,我们使用Schatten-$ $ p $ quasi-norm正规化和LRTC研究了LRTC的概括能力。该定理表明,相对更清晰的正规化程序会导致更严格的误差绑定,这与我们的数值结果一致。合成数据和实际数据的数值结果证明了与基线方法相比,我们方法的有效性和优势。
translated by 谷歌翻译
缺少价值估算对于现实世界数据科学工作流程至关重要。在线设置中的估算更加困难,因为它需要归纳方法本身能够随着时间的推移而发展。对于实际应用,估算算法应产生符合真实数据分布的避免,处理混合类型的数据,包括序数,布尔和连续变量,并缩放到大型数据集。在这项工作中,我们使用高斯Copula开发了一种新的在线估算算法,用于混合数据。在线高斯Copula模型符合所有Desiderata:其避免符合混合数据的数据分布,当流数据具有变化的分布时的准确性,以及速度(最多级)的精度上的离线对应物匹配。特别是在大规模的数据集上。通过将Copula模型拟合到在线数据,我们还提供了一种新方法,可以使用缺失值检测多变量依赖结构中的变化点。合成和现实世界数据的实验结果验证了所提出的方法的性能。
translated by 谷歌翻译
本文提出了弗兰克 - 沃尔夫(FW)的新变种​​,称为$ k $ fw。标准FW遭受缓慢的收敛性:迭代通常是Zig-zag作为更新方向振荡约束集的极端点。新变种,$ k $ fw,通过在每次迭代中使用两个更强的子问题oracelles克服了这个问题。第一个是$ k $线性优化Oracle($ k $ loo),计算$ k $最新的更新方向(而不是一个)。第二个是$ k $方向搜索($ k $ ds),最大限度地减少由$ k $最新更新方向和之前迭代表示的约束组的目标。当问题解决方案承认稀疏表示时,奥克斯都易于计算,而且$ k $ FW会迅速收敛,以便平滑凸起目标和几个有趣的约束集:$ k $ fw实现有限$ \ frac {4l_f ^ 3d ^} { \ Gamma \ Delta ^ 2} $融合在多台和集团规范球上,以及光谱和核规范球上的线性收敛。数值实验验证了$ k $ fw的有效性,并展示了现有方法的数量级加速。
translated by 谷歌翻译
Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect. Surprisingly, overreliance does not reduce when the AI produces explanations for its predictions, compared to only providing predictions. Some have argued that overreliance results from cognitive biases or uncalibrated trust, attributing overreliance to an inevitability of human cognition. By contrast, our paper argues that people strategically choose whether or not to engage with an AI explanation, demonstrating empirically that there are scenarios where AI explanations reduce overreliance. To achieve this, we formalize this strategic choice in a cost-benefit framework, where the costs and benefits of engaging with the task are weighed against the costs and benefits of relying on the AI. We manipulate the costs and benefits in a maze task, where participants collaborate with a simulated AI to find the exit of a maze. Through 5 studies (N = 731), we find that costs such as task difficulty (Study 1), explanation difficulty (Study 2, 3), and benefits such as monetary compensation (Study 4) affect overreliance. Finally, Study 5 adapts the Cognitive Effort Discounting paradigm to quantify the utility of different explanations, providing further support for our framework. Our results suggest that some of the null effects found in literature could be due in part to the explanation not sufficiently reducing the costs of verifying the AI's prediction.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
为了评估任何医疗干预的有效性,研究人员必须进行时间 - 密集和高度手动的文献综述。NLP系统可以帮助自动或协助实现这一昂贵的过程。为了支持这一目标,我们发布MS ^ 2(医学研究的多文件摘要),一个超过470K文档的数据集和来自科学文献的20k摘要。此数据集促进了可以在多项研究中评估和聚合矛盾证据的系统的开发,并且是生物医学领域的第一个大型公开可用的多文件摘要数据集。我们试验基于BART的摘要系统,具有前景的早期结果。我们以自由文本和结构形式制定我们的摘要输入和目标,并修改最近提出的指标,以评估我们系统生成的摘要的质量。数据和模型可在https://github.com/allenai/ms2上获得
translated by 谷歌翻译